Discovering Hidden Relationships via Efficient Co-clustering of Sparse Matrices
نویسندگان
چکیده
We present a new approach for discovering hidden relationships in sparse bipartite data by identifying large, dense biclusters in the corresponding matrix. We motivate a new class of metrics to measure the quality of a bicluster partition, and compare them to existing metrics. We then present a heuristic algorithm that efficiently searches the space of possible co-clusterings for one which maximizes the value of a given metric. We evaluate our approach with experiments on synthetic and real-world datasets.
منابع مشابه
Feature Enriched Nonparametric Bayesian Co-clustering
Co-clustering has emerged as an important technique for mining relational data, especially when data are sparse and high-dimensional. Coclustering simultaneously groups the different kinds of objects involved in a relation. Most co-clustering techniques typically only leverage the entries of the given contingency matrix to perform the two-way clustering. As a consequence, they cannot predict th...
متن کاملCoronavirus: Discover the Structure of Global Knowledge, Hidden Patterns & Emerging Events
Background & Objective: The present study aimed at exploring the structure of global knowledge, hidden patterns, and emerging Coronavirus events using co-word techniques. Co-word analysis is one of the most efficient scientific methods to analyze the structure and dynamics of knowledge and the general state of research. Materials & Methods: This applied research performed using Co-word anal...
متن کاملImage Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملData mining using the crossing minimization paradigm
Our ability and capacity to generate, record and store multi-dimensional, apparently unstructured data is increasing rapidly, while the cost of data storage is going down. The data recorded is not perfect, as noise gets introduced in it from different sources. Some of the basic forms of noise are incorrect recording of values and missing values. The formal study of discovering useful hidden inf...
متن کاملA Survey on Novel Graph Based Clustering and Visualization Using Data Mining Algorithm
As the amount of data stored by various information systems grows very fast, there is an urgent need for a new generation of computational techniques and tools to assist humans in extracting useful information (knowledge) from the rapidly growing volume of data. Data mining (DM) is one of the most useful methods for exploring large data sets. Clustering, as a special area of data mining, is one...
متن کامل